Hardware Supported Synchronization Primitives for Clusters
نویسندگان
چکیده
Parallel architectures with shared memory are well suited to many applications, provided that efficient shared memory access and process synchronization mechanisms are available. When the parallel machine is a cluster with physically distributed memory, software based synchronization mechanisms together with virtual memory infrastructure can implement Software Distributed Shared Memory (S-DSM), a shared memory abstraction on a distributed memory machine. However, the communication network overload from the emulation can limit the performance of such systems. This problem motivated our research, in which we developed a set of synchronization primitives for S-DSM on reconfigurable hardware. This hardware implements an auxiliary synchronization network, which works in parallel to the data communication network. Experiments evaluating our hardware implementation against a software one showed that our system increases the performance of these S-DSM primitives by a factor of 40 or more.
منابع مشابه
Lightweight 4x4 MDS Matrices for Hardware-Oriented Cryptographic Primitives
Linear diffusion layer is an important part of lightweight block ciphers and hash functions. This paper presents an efficient class of lightweight 4x4 MDS matrices such that the implementation cost of them and their corresponding inverses are equal. The main target of the paper is hardware oriented cryptographic primitives and the implementation cost is measured in terms of the required number ...
متن کاملA Comparison of Software and Hardware Synchronization Mechanisms for Distributed Shared Memory Multiprocessors
E cient synchronization is an essential component of parallel computing The designers of traditional multiprocessors have included hardware support only for simple operations such as compare and swap and load linked store conditional while high level synchronization primitives such as locks barriers and condition variables have been implemented in software With the advent of directory based dis...
متن کاملImproving the Throughput of Synchronization by Insertion of Delays
Efficiency of synchronization mechanisms can limit the parallel performance of many shared-memory applications. In addition, the ever increasing performance gap between processor and interprocessor communication may further compromise the scalability of these primitives. Ideally, synchronization primitives should provide high performance under both high and low contention without requiring subs...
متن کاملUniversity of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Synchronization for Dynamic Task Parallelism on Manycore Architectures
Manycore architectures –hundreds to thousands of cores per processor – are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in reality requires a productive programming interface for parallel programming, and an efficient execution and thread coordination runtime. Dynamic task parallelism, introduced recently in several programming langu...
متن کاملHardware Support for Synchronization in the Scalable Coherent Interface (SCI)
The exploitation of the inherent parallelism in applications depends critically on the eeciency of the synchronization and data exchange primitives provided by the hardware. This paper discusses and analyses such primitives as they are implemented in a pending IEEE standard 1596 for communication in a shared memory multiprocessor, the Scalable Coherent Interface (SCI). The SCI synchronization p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008